VIBRANT Trial - Primary Analyses

Author

Laura Symul (UCLouvain), Laura Vermeren (UCLouvain), Susan Holmes (Stanford University)

Published

July 29, 2025

Code
library(tidyverse)
library(magrittr)
library(gt)
library(patchwork)
library(SummarizedExperiment)
library(tidySummarizedExperiment)
library(brms)
library(gtsummary)
library(labelled)
library(scales)
# library(gridExtra)
library(ggpubr)
# library(mia) # BiocManager::install("mia")

theme_set(theme_light())
tmp <- fs::dir_map("R", source)
rm(tmp)
Code
tmp <- fs::dir_map("../VIBRANT-99-utils/R/", source)
rm(tmp)

This document details the analyses and provides results for the primary analyses of the VIBRANT trial.

Trial data

The data used for these analyses are stored in an R Bioconductor MultiAssayExperiment object which contains the QCed and pre-processed data. The code and description of how raw data have been processed and QCed is available on the VIBRANT-0-QC-and-preprocessing GitHub repository (Still private at the moment, but will be made public upon publication, and anyone wishing to have access can email Laura Symul).

Code
mae_file <- 
  fs::dir_ls(str_c(data_dir(), "04 unblinded MAEs/"), regexp = "mae_full_.*\\.rds$") |> 
  sort(decreasing = TRUE) |>
  magrittr::extract(1)

cat(str_c("Loading the RDS file containing the unblinded VIBRANT MAE: /", mae_file |> str_remove(data_dir()), "\n"))
Loading the RDS file containing the unblinded VIBRANT MAE: /04 unblinded MAEs/mae_full_20250715.rds
Code
mae <- readRDS(mae_file)

rm(mae_file)

fs::dir_ls(str_c(data_dir(), "01 Preprocessed and QCed/"), regexp = "exposures_.*\\.Rdata$") |> 
  sort(decreasing = TRUE) |>
  magrittr::extract(1) |> 
  load()

Demographics (Table 1 & suppl. tables by sites)

Summary table

The demographics table is built as follow.

We first create a table1_data data.frame from the participant_crfs_merged table stored in the @metadata slot of the mae.

This table contains almost all variables needed to create the demographics table (i.e., site, age, race, ethn, cut_size_meals, eat_less, hungry_did_not_eat, sexual_partners_lifetime, sexual_partners_past_month, contraception_method).

In addition, the mae@colData contains the participants’ arm and study population flags (here we keep the mITT population).

From the variables listed above, we create the variable food_insecurity using the three variables cut_size_meals, eat_less and hungry_did_not_eat: if a participant answered “Yes” to any of the corresponding questions, then she is considered to be affected by food insecurity.

Code
table1_data <- 
  metadata(mae)$participant_crfs_merged |> 
  dplyr::left_join(
    colData(mae) |> as_tibble() |> select(pid, site, arm, ITT, mITT, PP) |> distinct(), 
    by = join_by(pid, site)
    )  |> 
  filter(mITT) |> 
  select(pid, site, arm, age, race, ethn, cut_size_meals, eat_less, hungry_did_not_eat, sexual_partners_lifetime, sexual_partners_past_month, contraception_method) |> 
  mutate(
    food_insecurity = ifelse(
      cut_size_meals == "Yes" | eat_less == "Yes" | hungry_did_not_eat == "Yes",
      "Yes",
      "No"
    ),
    food_insecurity = factor(food_insecurity, levels = c("No", "Yes"))
  ) |> 
  select(-c(eat_less, cut_size_meals, hungry_did_not_eat))
Code
table1_data |> 
  ggplot(aes(x = food_insecurity)) +
  geom_bar() +
  labs(x= "Food insecurity", y = "Count")+
  theme_classic()

In addition to these variables, we also include the participants’ partner’s gender.

Participants are asked about their partner’s gender at all weekly visits (1000, 1100, 1200, 1300, 1400, 1500, 1700, 1900, 2120) using CRF 19.

Answers to that question can be found in the past_week_sex_partner column of the mae@metadata$visits_crfs_merged table. Participants’ answer throughout the weekly visits will be summarized into one of these 4 categories:

  • No sex
  • Only male
  • Only female
  • Both

This variable is added to the table1_data table.

Code
gender_sexual_partner <- 
  metadata(mae)$visits_crfs_merged |>
  select(pid, past_week_sex_partner) |>
  filter(!is.na(past_week_sex_partner)) |>
  group_by(pid) |>
  summarise(
    gender_sexual_partner = 
      case_when(
        all(past_week_sex_partner == "No sex in the past week") ~ "No sex",
        any(past_week_sex_partner %in% c("A single male partner", "Multiple male partners")) &
          !any(past_week_sex_partner %in% c("A single female partner", "Multiple female partners")) ~ "Only man",
        any(past_week_sex_partner %in% c("A single female partner", "Multiple female partners")) &
          !any(past_week_sex_partner %in% c("A single male partner", "Multiple male partners")) ~ "Only female",
        any(past_week_sex_partner %in% c("A single male partner", "Multiple male partners")) &
          any(past_week_sex_partner %in% c("A single female partner", "Multiple female partners")) ~ "Both",
        TRUE ~ NA_character_
      ),
    .groups = "drop"
  )

table1_data <- 
  table1_data |> 
  left_join(gender_sexual_partner, by = "pid")

We also re-code the race variable as follows:

  • “Asian or South Asian” is re-coded to “Asian” (for brevity)
  • “Black, African American, or African” and “African” are re-coded to “Black/African” (for brevity)
  • “Coloured”, “White”, “Other”, and “Prefered not to answer” are left as is.
Note

@Disebo & Caroline: do we go for British or American English? (Colored vs Coloured)?

Code
table1_data <-
  table1_data |>
  
  mutate(
    across(where(is.character), as.factor),
    
    race = fct_recode(
      race,
      "Asian" = "Asian or South Asian",
      "Black/African" = "Black, African American, or African",
      "Black/African" = "African",
      "Coloured" = "Coloured",
      "White" = "White",
      "Other" = "Other",
      "Prefer not to answer" = "Prefer not to answer"
    ),
    
    race = fct_relevel(race, "Asian", "Black/African", "Coloured", "White", "Other", "Prefer not to answer"),
    race = race |> fct_drop(),
    arm = arm |> fct_drop()
  ) |>
  
  set_variable_labels(
    food_insecurity = "Report food insecurity in past 12 months ",
    sexual_partners_past_month = "Number of partners in past month",
    sexual_partners_lifetime = "Number of partners in lifetime",
    ethn = "Ethnicity", 
    contraception_method = "Contraception", 
    gender_sexual_partner = "Gender of sexual partners"
  )
Warning

@Laura Vermeren: I changed the summary statistics from mean to median for the continuous variables (for number of past sexual partner, the mean was artificially high in the arm with the 90 partner MGH participant). Would you be able to check that the tests selected by the gt_summary package are appropriate for the data we have? They should be, but I’d love if we could double-check :) >LV : I check and I think it is good :)

Table 1 (Population characteristics by arm)

Code
table1_data |> 
  select(-pid) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  )  |> 
  add_p(include = -Site)
Characteristic Pl
N = 191
LC-106-7
N = 211
LC-106-3
N = 151
LC-106-o
N = 151
LC-115
N = 201
p-value2
Site





    CAP 14 (74%) 14 (67%) 14 (93%) 14 (93%) 14 (70%)
    MGH 5 (26%) 7 (33%) 1 (6.7%) 1 (6.7%) 6 (30%)
Age 29 (20-38) 29 (18-40) 26 (21-34) 25 (19-35) 30 (20-40) 0.5
Race




0.5
    Asian 1 (5.3%) 0 (0%) 1 (6.7%) 0 (0%) 0 (0%)
    Black/African 14 (74%) 18 (86%) 14 (93%) 15 (100%) 18 (90%)
    White 2 (11%) 3 (14%) 0 (0%) 0 (0%) 1 (5.0%)
    Other 1 (5.3%) 0 (0%) 0 (0%) 0 (0%) 1 (5.0%)
    Prefer not to answer 1 (5.3%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Ethnicity




0.8
    Not hispanic/latino/spanish 3 (60%) 6 (86%) 1 (100%) 1 (100%) 4 (67%)
    hispanic/latino/spanish 2 (40%) 1 (14%) 0 (0%) 0 (0%) 2 (33%)
    Unknown 14 14 14 14 14
Number of partners in lifetime 5 (1-10) 5 (1-90) 4 (1-10) 4 (1-20) 5 (1-10) 0.8
Number of partners in past month 1 (1-2) 1 (0-2) 1 (0-2) 1 (0-3) 1 (0-2) 0.4
Contraception




>0.9
    Continuous combined oral contraceptive pills (skip placebo) 3 (16%) 5 (24%) 1 (6.7%) 1 (6.7%) 3 (15%)
    Contraceptive vaginal ring 0 (0%) 1 (4.8%) 0 (0%) 0 (0%) 0 (0%)
    Cyclic combined oral contraceptive pills 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (5.0%)
    Depo provera 11 (58%) 10 (48%) 12 (80%) 11 (73%) 11 (55%)
    Hormonal implant 1 (5.3%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Hormone containing IUD 2 (11%) 2 (9.5%) 0 (0%) 0 (0%) 2 (10%)
    Net-EN 2 (11%) 2 (9.5%) 2 (13%) 3 (20%) 2 (10%)
    Other 0 (0%) 1 (4.8%) 0 (0%) 0 (0%) 1 (5.0%)
Report food insecurity in past 12 months 7 (37%) 7 (33%) 6 (40%) 5 (33%) 6 (30%) >0.9
Gender of sexual partners




0.8
    No sex 1 (5.3%) 3 (14%) 1 (6.7%) 1 (6.7%) 3 (15%)
    Only man 18 (95%) 18 (86%) 14 (93%) 14 (93%) 17 (85%)
1 n (%); Median (Min-Max)
2 Kruskal-Wallis rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

Study population characteristics appear balanced across arms.

Population characteristics by site

Code
table1_data |> 
  select(arm, age, race, site, food_insecurity, contraception_method, sexual_partners_past_month, sexual_partners_lifetime) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Site,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  )  |> 
  add_p()
Characteristic CAP
N = 701
MGH
N = 201
p-value2
Arm

0.2
    Pl 14 (20%) 5 (25%)
    LC-106-7 14 (20%) 7 (35%)
    LC-106-3 14 (20%) 1 (5.0%)
    LC-106-o 14 (20%) 1 (5.0%)
    LC-115 14 (20%) 6 (30%)
Age 26 (18-37) 33 (22-40) <0.001
Race

<0.001
    Asian 0 (0%) 2 (10%)
    Black/African 70 (100%) 9 (45%)
    White 0 (0%) 6 (30%)
    Other 0 (0%) 2 (10%)
    Prefer not to answer 0 (0%) 1 (5.0%)
Report food insecurity in past 12 months 28 (40%) 3 (15%) 0.038
Contraception

<0.001
    Continuous combined oral contraceptive pills (skip placebo) 5 (7.1%) 8 (40%)
    Contraceptive vaginal ring 0 (0%) 1 (5.0%)
    Cyclic combined oral contraceptive pills 0 (0%) 1 (5.0%)
    Depo provera 54 (77%) 1 (5.0%)
    Hormonal implant 0 (0%) 1 (5.0%)
    Hormone containing IUD 0 (0%) 6 (30%)
    Net-EN 11 (16%) 0 (0%)
    Other 0 (0%) 2 (10%)
Number of partners in past month 1 (0-3) 1 (0-2) 0.4
Number of partners in lifetime 4 (1-20) 10 (4-90) <0.001
1 n (%); Median (Min-Max)
2 Fisher’s exact test; Wilcoxon rank sum test; Pearson’s Chi-squared test

There are, unsurprisingly, striking differences in study population characteristics by site.

Population characteristics by arm, split by site

Code
table1_data |> 
  filter(site == "CAP") |> 
  select(-c(pid, site)) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) |> 
  add_p()|>
  modify_caption("**Site = CAP**")
Site = CAP
Characteristic Pl
N = 141
LC-106-7
N = 141
LC-106-3
N = 141
LC-106-o
N = 141
LC-115
N = 141
p-value2
Age 26 (20-36) 26 (18-36) 26 (21-34) 25 (19-35) 25 (20-37) >0.9
Race




>0.9
    Asian 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Black/African 14 (100%) 14 (100%) 14 (100%) 14 (100%) 14 (100%)
    White 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Other 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Prefer not to answer 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Ethnicity





    Not hispanic/latino/spanish 0 (NA%) 0 (NA%) 0 (NA%) 0 (NA%) 0 (NA%)
    hispanic/latino/spanish 0 (NA%) 0 (NA%) 0 (NA%) 0 (NA%) 0 (NA%)
    Unknown 14 14 14 14 14
Number of partners in lifetime 4 (1-10) 3 (1-7) 4 (1-10) 4 (1-20) 4 (1-9) 0.8
Number of partners in past month 1 (1-2) 1 (1-2) 1 (0-2) 1 (0-3) 1 (1-2) 0.2
Contraception




0.9
    Continuous combined oral contraceptive pills (skip placebo) 1 (7.1%) 2 (14%) 0 (0%) 0 (0%) 2 (14%)
    Contraceptive vaginal ring 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Cyclic combined oral contraceptive pills 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Depo provera 11 (79%) 10 (71%) 12 (86%) 11 (79%) 10 (71%)
    Hormonal implant 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Hormone containing IUD 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Net-EN 2 (14%) 2 (14%) 2 (14%) 3 (21%) 2 (14%)
    Other 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Report food insecurity in past 12 months 7 (50%) 5 (36%) 6 (43%) 5 (36%) 5 (36%) >0.9
Gender of sexual partners




>0.9
    No sex 1 (7.1%) 2 (14%) 1 (7.1%) 1 (7.1%) 2 (14%)
    Only man 13 (93%) 12 (86%) 13 (93%) 13 (93%) 12 (86%)
1 Median (Min-Max); n (%)
2 Kruskal-Wallis rank sum test; Fisher’s exact test; NA; Pearson’s Chi-squared test
Code
table1_data |> 
  filter(site == "MGH") |> 
  select(-c(pid, site)) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) |> 
  add_p()|>
  modify_caption("**Site = MGH**")
Site = MGH
Characteristic Pl
N = 51
LC-106-7
N = 71
LC-106-3
N = 11
LC-106-o
N = 11
LC-115
N = 61
p-value2
Age 32 (26-38) 33 (24-40) 25 (25-25) 35 (35-35) 33 (22-40) 0.7
Race




0.13
    Asian 1 (20%) 0 (0%) 1 (100%) 0 (0%) 0 (0%)
    Black/African 0 (0%) 4 (57%) 0 (0%) 1 (100%) 4 (67%)
    White 2 (40%) 3 (43%) 0 (0%) 0 (0%) 1 (17%)
    Other 1 (20%) 0 (0%) 0 (0%) 0 (0%) 1 (17%)
    Prefer not to answer 1 (20%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Ethnicity




0.8
    Not hispanic/latino/spanish 3 (60%) 6 (86%) 1 (100%) 1 (100%) 4 (67%)
    hispanic/latino/spanish 2 (40%) 1 (14%) 0 (0%) 0 (0%) 2 (33%)
Number of partners in lifetime 9 (7-10) 15 (6-90) 7 (7-7) 10 (10-10) 6 (4-10) 0.059
Number of partners in past month 1 (1-2) 1 (0-2) 1 (1-1) 1 (1-1) 1 (0-2) >0.9
Contraception




>0.9
    Continuous combined oral contraceptive pills (skip placebo) 2 (40%) 3 (43%) 1 (100%) 1 (100%) 1 (17%)
    Contraceptive vaginal ring 0 (0%) 1 (14%) 0 (0%) 0 (0%) 0 (0%)
    Cyclic combined oral contraceptive pills 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (17%)
    Depo provera 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (17%)
    Hormonal implant 1 (20%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Hormone containing IUD 2 (40%) 2 (29%) 0 (0%) 0 (0%) 2 (33%)
    Net-EN 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Other 0 (0%) 1 (14%) 0 (0%) 0 (0%) 1 (17%)
Report food insecurity in past 12 months 0 (0%) 2 (29%) 0 (0%) 0 (0%) 1 (17%) 0.8
Gender of sexual partners




>0.9
    No sex 0 (0%) 1 (14%) 0 (0%) 0 (0%) 1 (17%)
    Only man 5 (100%) 6 (86%) 1 (100%) 1 (100%) 5 (83%)
1 Median (Min-Max); n (%)
2 Kruskal-Wallis rank sum test; Fisher’s exact test

Study population characteristics appear balanced across arms at the CAPRISA site. Given the low number of participants at the MGH site in two of the five arms, we create an additional table excluding these two arms.

Code
table1_data |> 
  filter(site == "MGH", !(arm %in% c("LC-106-o", "LC-106-3"))) |> 
  mutate(arm = arm |> fct_drop()) |> 
  select(-c(pid, site)) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) |> 
  add_p()|>
  modify_caption("**Site = MGH**")
Site = MGH
Characteristic Pl
N = 51
LC-106-7
N = 71
LC-115
N = 61
p-value2
Age 32 (26-38) 33 (24-40) 33 (22-40) >0.9
Race


0.14
    Asian 1 (20%) 0 (0%) 0 (0%)
    Black/African 0 (0%) 4 (57%) 4 (67%)
    White 2 (40%) 3 (43%) 1 (17%)
    Other 1 (20%) 0 (0%) 1 (17%)
    Prefer not to answer 1 (20%) 0 (0%) 0 (0%)
Ethnicity


0.7
    Not hispanic/latino/spanish 3 (60%) 6 (86%) 4 (67%)
    hispanic/latino/spanish 2 (40%) 1 (14%) 2 (33%)
Number of partners in lifetime 9 (7-10) 15 (6-90) 6 (4-10) 0.017
Number of partners in past month 1 (1-2) 1 (0-2) 1 (0-2) 0.8
Contraception


>0.9
    Continuous combined oral contraceptive pills (skip placebo) 2 (40%) 3 (43%) 1 (17%)
    Contraceptive vaginal ring 0 (0%) 1 (14%) 0 (0%)
    Cyclic combined oral contraceptive pills 0 (0%) 0 (0%) 1 (17%)
    Depo provera 0 (0%) 0 (0%) 1 (17%)
    Hormonal implant 1 (20%) 0 (0%) 0 (0%)
    Hormone containing IUD 2 (40%) 2 (29%) 2 (33%)
    Net-EN 0 (0%) 0 (0%) 0 (0%)
    Other 0 (0%) 1 (14%) 1 (17%)
Report food insecurity in past 12 months 0 (0%) 2 (29%) 1 (17%) 0.7
Gender of sexual partners


>0.9
    No sex 0 (0%) 1 (14%) 1 (17%)
    Only man 5 (100%) 6 (86%) 5 (83%)
1 Median (Min-Max); n (%)
2 Kruskal-Wallis rank sum test; Fisher’s exact test

Besides the number of lifetime sexual partners, arms appear balanced across arm at MGH.

Primary outcomes (Table 2)

LBP colonization by week 5

The primary outcome is defined as “Colonization by any of the LBP strains after product administration by week 5”, and colonization by any of the LBP strain is defined as a relative abundance of 5% by any single strain or 10% total relative abundance of the LBP strains. The relative abundance of LBP strain is estimated using metagenomics (taxonomic composition estimated using VIRGO2 and strain proportion of total L. crispatus is estimated using kSanity), and includes samples from weekly visits 1200 to 1500 (week 2 (post-INT) to week 5 (3 weeks post-INT)).

The computation itself has been done at the QC/Pre-processing stage (see the VIBRANT-0-QC-and-preprocessing GitHub repository).

Code
col_by_week5 <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) 
Code
summary_table <- 
  col_by_week5 |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_colonization_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "exact")
  )
Code
table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )
Code
table_2 |> 
  group_by(site) |> 
  gt(caption = "Table 1: Colonization by week 5 by arm and site", row_group_as_column = TRUE) |> 
  cols_width(
    "name" ~ px(200),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    # Blinded = "All blinded arms",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )
Table 1: Colonization by week 5 by arm and site
Placebo LC-106
7 days
LC-106
3 days
LC-106
overlap
LC-115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 0 (0 %) 9 (64 %) 7 (50 %) 7 (50 %) 10 (71 %)
95% CI 0% - 23% 35% - 87% 23% - 77% 23% - 77% 42% - 92%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 1 (20 %) 6 (86 %) 1 (100 %) 1 (100 %) 6 (100 %)
95% CI 1% - 72% 42% - 100% 3% - 100% 3% - 100% 54% - 100%

Visualization of the primary outcome

Code
col_by_week5 |> 
  arrange(site, arm, LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  theme(
    legend.position = "top",
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) 

Tests

We test for differences between the placebo arm and each active arm for the primary outcome using Fisher’s exact test (exhaustive permutation test).

Data from both site are merged for each arm given the low numbers of participants at MGH.

The \(p\)-values are adjusted for multiple testing (multiple active arms against the Placebo) using the Benjamini-Hochberg method.

Code
arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

fisher_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- col_by_week5 |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_colonization_by_week5, tmp$arm |> fct_drop())
    test_res <- fisher.test(table_for_test, alternative = "greater")
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      OR = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )
Code
test_results <- 
  fisher_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    OR = OR |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, OR, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "OR" ~ "OR",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "OR" ~ "",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(caption = "Table 2: Colonization by week 5 by arm and site", row_group_as_column = TRUE) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )
Table 2: Colonization by week 5 by arm and site
Placebo LC-106
7 days
LC-106
3 days
LC-106
overlap
LC-115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 0 (0 %) 9 (64 %) 7 (50 %) 7 (50 %) 10 (71 %)
95% CI 35% - 87% 23% - 77% 23% - 77% 42% - 92%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 1 (20 %) 6 (86 %) 1 (100 %) 1 (100 %) 6 (100 %)
95% CI 42% - 100% 3% - 100% 3% - 100% 54% - 100%
Adj. p-value Ref. <0.001 0.002 0.002 <0.001
OR 39.76 18.61 18.61 60.46
95% CI 5.69 - Inf 2.48 - Inf 2.48 - Inf 8.17 - Inf

Model

[This section still needs more work given the low sample size in some of the arms. Code is not executed].

Code
col_by_week5_agg <- 
  col_by_week5 |>
  group_by(site, arm) |>
  summarise(success = LBP_colonization_by_week5 |> sum(), ntrials = n()) 

bfit <- 
  brm(
    success | trials(ntrials) ~ arm + site + site:arm, 
    data = col_by_week5_agg, 
    family = binomial("logit")
    )
Code
summary(bfit)
plot(bfit, ask = FALSE)
bind_cols(col_by_week5_agg, predict(bfit))
# loo(bfit, moment_match = TRUE, reloo = TRUE)
# pp_check(bfit)
Code
plot_res <- plot(conditional_effects(bfit), ask = FALSE)
Code
plot_res <- lapply(plot_res, function(p) {
  p + scale_color_manual(values = site_colors)
})

Reduce(`+`, plot_res) + 
  plot_layout(widths = c(2.2, 1, 3))
Code
model_results <- summary(bfit)$fixed
model_results <- 
  model_results |> 
  rownames_to_column("term") |> 
  as_tibble() |> 
  mutate(
    factor = 
      case_when(
        str_detect(term, ":site") ~ "Arm x Site",
        str_detect(term, "^site") ~ "Site",
        str_detect(term, "^arm") ~ "Arm",
        TRUE ~ "Ref.",
      ),
    factor_level = 
      term |> 
      str_remove("arm") |> str_remove("site") |> str_replace(":"," x ") |> 
      str_replace("6M", "6-") |> str_replace("CM", "C-")
  ) |> 
  mutate(
    OR = ifelse(factor == "Ref.", NA, exp(Estimate)),
    lower_CI = ifelse(factor == "Ref.", NA, exp(`l-95% CI`)),
    upper_CI = ifelse(factor == "Ref.", NA, exp(`u-95% CI`))
  ) 
Code
model_results |> 
  select(factor, factor_level, OR, lower_CI, upper_CI) |>
  gt() |> 
  gt::fmt_number(
    decimals = 2, n_sigfig = 3
  )
Code
model_results_arm <- 
  model_results |> 
  filter(factor == "Arm") |> 
  mutate(
    OR = OR |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(factor_level, OR, CI) |>
  pivot_longer(cols = -factor_level, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = factor_level, values_from = value) 
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    model_results_arm
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "OR" ~ "Ref.",
        name == "CI" ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(caption = "Table 2: Colonization by week 5 by arm and site", row_group_as_column = TRUE) |> 
  cols_width(
    "name" ~ px(200),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )

Sensitivity analyses (Supplementary tables and figures)

LBP colonization by week 5 excluding immediately post-product visits

Defining the alternative primary outcome

Code
new_outcome_def <- 
  # we start from the "colonized at"
  mae[["col_LBP_mg"]] |> 
  as_tibble() |> 
  dplyr::filter(.feature == "colonized_LBP_at_mg") |> 
  dplyr::left_join(
    mae@colData |> as_tibble() |> select(uid, pid, visit_code, arm),
    by = join_by(.sample == uid)
  ) |> 
  filter(!is.na(arm))

# we fill the missing visits
new_outcome_def <- 
  new_outcome_def |> 
  select(-c(arm)) |> 
  dplyr::full_join(
    new_outcome_def |> 
      select(pid, arm) |> 
      distinct() |> 
      expand_grid(
        visit_code = 
          unique(new_outcome_def$visit_code) |> sort() |> setdiff(c("0001","0010"))
        ),
    by = join_by(pid, visit_code)
  ) |> 
  arrange(pid, visit_code) |> 
  mutate(outcome = outcome |> replace_na(FALSE)) |> 
  dplyr::filter(!is.na(arm))
  
# we compute the new outcome
new_outcome_def <- 
  new_outcome_def |>  
  arrange(pid, visit_code) |> 
  group_by(pid) |> 
  mutate(
    tmp = 
      case_when(
        (arm %in% c("Pl", "LC-106-7", "LC-115")) & (as.numeric(visit_code) <= 1200) ~ FALSE,
        (arm %in% c("LC-106-3", "LC-106-o")) & (as.numeric(visit_code) < 1200) ~ FALSE,
        TRUE ~ outcome
      ),
    colonized_LBP_by_mg = cummax(tmp) |> as.logical()
  ) |> 
  ungroup()
Code
new_outcome_def |> 
  left_join(mae@colData |> as_tibble() |> select(pid, mITT) |> distinct()) |>
  ggplot() +
  aes(y = visit_code |> fct_rev(), x = pid, fill = colonized_LBP_by_mg) +
  facet_grid(. ~ arm + ifelse(mITT, "mITT", "ITT"), scales = "free", space = "free") +
  geom_tile() + 
  scale_fill_manual(
    "Colonized by LBP by MG by each visit",
    values = c("gray", "orangered"), labels = c("No", "Yes")
    ) +
  ylab("Visit code") + xlab("Participant IDs") +
  labs(
    caption = "Colonization at missed visits was considered negative."
  ) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
    legend.position = "bottom"
  )

Code
new_outcome_def <- 
  new_outcome_def |> 
  dplyr::rename(colonized_LBP_by_mg_alt = colonized_LBP_by_mg) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::filter(.feature == "colonized_LBP_by_mg") |> 
      select(.sample, outcome) |> 
      dplyr::rename(colonized_LBP_by_mg = outcome)
  )
Code
new_outcome_def |> 
  dplyr::filter(!is.na(colonized_LBP_by_mg)) |> 
  dplyr::count(colonized_LBP_by_mg_alt, colonized_LBP_by_mg)
new_outcome_def |> 
  dplyr::filter(visit_code == "1500", !is.na(colonized_LBP_by_mg)) |>
  dplyr::count(colonized_LBP_by_mg_alt, colonized_LBP_by_mg)
Code
col_by_week5 <- 
  new_outcome_def |> 
  select(pid, visit_code, colonized_LBP_by_mg_alt) |> 
  left_join(
    mae@colData |> as_tibble() |> 
      select(uid, pid, visit_code, site, randomized, arm, ITT, mITT, PP)
    ) |> 
  dplyr::filter(randomized, mITT) |> 
  dplyr::filter(visit_code == "1500") |> 
  mutate(LBP_colonization_by_week5 = colonized_LBP_by_mg_alt |> replace_na(FALSE)) 

Visualization of the alternative primary outcome

Code
col_by_week5 |> 
  arrange(site, arm, LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5 (excluding immediately post-product visit)", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  theme(
    legend.position = "top",
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) 

Table

Code
summary_table <- 
  col_by_week5 |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_colonization_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "exact")
  )
Code
table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )
Code
arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

fisher_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- col_by_week5 |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_colonization_by_week5, tmp$arm |> fct_drop())
    test_res <- fisher.test(table_for_test, alternative = "greater")
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      OR = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )
Code
test_results <- 
  fisher_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    OR = OR |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, OR, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "OR" ~ "OR",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results |> filter(name != "Adj. p-value")
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "OR" ~ "Ref.",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "LBP Colonization by week 5 by arm and site excluding immediately post-product visit.", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )
LBP Colonization by week 5 by arm and site excluding immediately post-product visit.
Placebo LC-106
7 days
LC-106
3 days
LC-106
overlap
LC-115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 0 (0 %) 8 (57 %) 7 (50 %) 7 (50 %) 6 (43 %)
95% CI 29% - 82% 23% - 77% 23% - 77% 18% - 71%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 1 (20 %) 4 (57 %) 1 (100 %) 1 (100 %) 4 (67 %)
95% CI 18% - 90% 3% - 100% 3% - 100% 22% - 96%
OR Ref. 39.76 18.61 18.61 60.46
95% CI 5.69 - Inf 2.48 - Inf 2.48 - Inf 8.17 - Inf

LBP colonization by week 5, qPCR-estimated relative abundances

These analyses include the immediate post-product visits.

Code
# by qPCR 

col_by_week5_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_qpcr")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) 
Code
col_by_week5_qPCR |> 
  arrange(site, arm, LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  ggtitle("Colonization based on the relative abundances estimated by qPCR") +
  theme(
    legend.position = "top",
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
    plot.title = element_text(hjust = 0.5)
  ) 

Table

Code
summary_table <- 
  col_by_week5_qPCR |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_colonization_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "exact")
  )
Code
table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )
Code
arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

fisher_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- col_by_week5_qPCR |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_colonization_by_week5, tmp$arm |> fct_drop())
    test_res <- fisher.test(table_for_test, alternative = "greater")
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      OR = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )
Code
test_results <- 
  fisher_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    OR = OR |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, OR, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "OR" ~ "OR",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results |> filter(name != "Adj. p-value")
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "OR" ~ "Ref.",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "LBP Colonization by week 5 by arm, with colonization definition based on qPCR-estimated relative abundance.", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )
LBP Colonization by week 5 by arm, with colonization definition based on qPCR-estimated relative abundance.
Placebo LC-106
7 days
LC-106
3 days
LC-106
overlap
LC-115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 2 (14 %) 11 (79 %) 8 (57 %) 8 (57 %) 10 (71 %)
95% CI 49% - 95% 29% - 82% 29% - 82% 42% - 92%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 1 (20 %) 6 (86 %) 1 (100 %) 1 (100 %) 6 (100 %)
95% CI 42% - 100% 3% - 100% 3% - 100% 54% - 100%
OR Ref. 20.25 7.44 7.44 19.07
95% CI 4.44 - Inf 1.62 - Inf 1.62 - Inf 4.15 - Inf

LBP detection by week 5, qPCR-based detection

In this analysis, LBP is considered to be detected if at least 2 LBP strains had observed Cq values at the same visit by week 5. “By week 5” includes all post-product visits from visit_code = 1200 (no excluding the immediate post-product visits.

Code
detected_by_week5_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "LBP_detected_by_qpcr")
  ) |> 
  mutate(LBP_detected_by_week5 = outcome |> replace_na(FALSE)) 
Code
detected_by_week5_qPCR |> 
  arrange(site, arm, LBP_detected_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_detected_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  ggtitle("Colonization based on qPCR-based detection of at least 2 LBP strains detected at the same visit by week 5") +
  theme(
    legend.position = "top",
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
    plot.title = element_text(hjust = 0.5)
  ) 

Table

Code
summary_table <- 
  detected_by_week5_qPCR |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_detected_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "exact")
  )
Code
table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )
Code
arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

fisher_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- detected_by_week5_qPCR |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_detected_by_week5, tmp$arm |> fct_drop())
    test_res <- fisher.test(table_for_test, alternative = "greater")
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      OR = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )
Code
test_results <- 
  fisher_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    OR = OR |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, OR, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "OR" ~ "OR",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results |> filter(name != "Adj. p-value")
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "OR" ~ "Ref.",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "LBP detection by week 5 by arm, with LBP detection defined as at least two LBP strains simultaneously detected by qPCR.", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )
LBP detection by week 5 by arm, with LBP detection defined as at least two LBP strains simultaneously detected by qPCR.
Placebo LC-106
7 days
LC-106
3 days
LC-106
overlap
LC-115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 2 (14 %) 12 (86 %) 8 (57 %) 9 (64 %) 11 (79 %)
95% CI 57% - 98% 29% - 82% 35% - 87% 49% - 95%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 3 (60 %) 6 (86 %) 1 (100 %) 1 (100 %) 6 (100 %)
95% CI 42% - 100% 3% - 100% 3% - 100% 54% - 100%
OR Ref. 15.28 4.01 5.29 14.44
95% CI 3.49 - Inf 0.99 - Inf 1.29 - Inf 3.28 - Inf

L. crispatus dominance by week 5

In this analysis, L. crispatus dominance is defined as a metagenomic-estimated relative abundance ≥ 50%. “By week 5” includes all post-product visits from visit_code = 1200 (no excluding the immediate post-product visits.

Code
# crispatus dominance 

crisp_dom_by_5week <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_crispatus_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "crispatus_dominance_by_mg")
  ) |> 
  mutate(crisp_dom_by_5week = outcome |> replace_na(FALSE)) 
Code
crisp_dom_by_5week |> 
  arrange(site, arm, crisp_dom_by_5week) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = crisp_dom_by_5week)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  ggtitle("L. crispatus dominance (≥ 50% by metagenomics)") +
  theme(
    legend.position = "top",
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
    plot.title = element_text(hjust = 0.5)
  ) 

Table

Code
summary_table <- 
  crisp_dom_by_5week |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(crisp_dom_by_5week),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "exact")
  )
Code
table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    Lc_dom = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, Lc_dom, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "Lc_dom" ~ "n (%) participants\n≥ 50% L. crispatus",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )
Code
arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

fisher_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- crisp_dom_by_5week |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$crisp_dom_by_5week, tmp$arm |> fct_drop())
    test_res <- fisher.test(table_for_test, alternative = "greater")
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      OR = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )
Code
test_results <- 
  fisher_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    OR = OR |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, OR, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "OR" ~ "OR",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results |> filter(name != "Adj. p-value")
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "OR" ~ "Ref.",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "Ever ≥50% L. crispatus by week 5 by arm.", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )
Ever ≥50% L. crispatus by week 5 by arm.
Placebo LC-106
7 days
LC-106
3 days
LC-106
overlap
LC-115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants ≥ 50% L. crispatus 0 (0 %) 8 (57 %) 5 (36 %) 6 (43 %) 10 (71 %)
95% CI 29% - 82% 13% - 65% 18% - 71% 42% - 92%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants ≥ 50% L. crispatus 1 (20 %) 6 (86 %) 1 (100 %) 1 (100 %) 5 (83 %)
95% CI 42% - 100% 3% - 100% 3% - 100% 36% - 100%
OR Ref. 32.33 11.13 14.44 46.74
95% CI 4.69 - Inf 1.43 - Inf 1.9 - Inf 6.51 - Inf

Secondary outcomes (Figures 2-4)

Code
fig_title <- mae_title(mae)

Kinetics of LBP colonization

Total relative abundances of LBP strains

Code
total_rel_ab_of_LBP <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP), !poor_quality_mg_data) |> 
  left_join(
    mae |> colData() |> as_tibble(), 
    by = join_by(uid)
  ) |>
  filter(randomized, mITT) |> 
  group_by(.sample, uid) |>
  summarize(
    total_LBP_rel_ab = sum(rel_ab, na.rm = TRUE),
    .groups = "drop"
  )

total_rel_ab_of_LBP <- 
  total_rel_ab_of_LBP |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid))


total_rel_ab_of_LBP <- 
  total_rel_ab_of_LBP |> 
  left_join(
    total_rel_ab_of_LBP |> 
      select(pid, arm, site) |> 
      distinct() |> 
      group_by(arm, site) |> 
      arrange(pid) |> 
      mutate(pid_nb = row_number()) |> 
      ungroup()
  )
Code
g_tot_prop_LBP_strains <- 
  total_rel_ab_of_LBP |>
  filter(visit_type == "Clinic", !is.na(arm), PIPV) |> 
  ggplot(aes(x = study_week, y = total_LBP_rel_ab, col = arm)) +
  geom_line(aes(group = pid), alpha = 0.5) +
  geom_point(size = 0.5, alpha = 0.5) +
  facet_grid(site ~ arm) + 
  scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
  scale_y_continuous(
    "Total relative abundances\nof LBP strains", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  expand_limits(y = 0) +
  scale_color_manual(values = arm_colors) +
  guides(col = "none") +
  ggtitle(fig_title) 
Code
g_tot_prop_LBP_strains

Code
g_tot_prop_LBP_strains +
  geom_label(aes(label = pid_nb), size = 2) +
  labs(
    caption = "Participants are labeled by their enrollment order within each arm and site."
  )

Proportions of participants who colonized in each arm and each visit

Code
get_longitudinal_proportions_of_participants_who_colonized <- 
  function(total_rel_ab_of_LBP) {
    total_rel_ab_of_LBP |> 
      filter(!is.na(arm), PIPV) |> 
      group_by(site, arm, study_week) |>
      summarize(
        prop_who_colonized_at = mean(total_LBP_rel_ab > 0.05), 
        CI_low = 
          prop.test(
            x = sum(total_LBP_rel_ab > 0.05), n = n(), 
            conf.level = 0.95)$conf.int[1],
        CI_high = 
          prop.test(
            x = sum(total_LBP_rel_ab > 0.05), n = n(), 
            conf.level = 0.95)$conf.int[2],
        .groups = "drop"
      )
  }
Code
plot_longitudinal_proportions_of_participants_who_colonized <- function(total_rel_ab_of_LBP){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_LBP) |>
      ggplot() +
      aes(x = study_week, y = prop_who_colonized_at, col = arm) +
      facet_grid(site ~ .) +
      geom_ribbon(
        aes(ymin = CI_low, ymax = CI_high, fill = arm), 
        alpha = 0.1, color = NA
      ) +
      geom_line() +
      geom_point() + 
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\nwith detected LBP strains", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      )  +
      expand_limits(y = 0) +
      scale_color_manual("Arm", values = arm_colors) +
      scale_fill_manual("Arm", values = arm_colors) 
  }
Code
g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_LBP)
g_prop_colonized_by + ggtitle(fig_title) 

Code
g_prop_colonized_by <- 
  plot_longitudinal_proportions_of_participants_who_colonized(
    total_rel_ab_of_LBP |> filter(!((arm %in% c("LC-106-3", "LC-106-o")) & (site == "MGH")))
    )
g_prop_colonized_by + ggtitle(fig_title) 

Code
plot_longitudinal_proportions_of_participants_who_colonized_v2 <- 
  function(total_rel_ab_of_LBP){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_LBP) |>
      ggplot(aes(x = study_week, y = prop_who_colonized_at, col = arm)) +
      geom_linerange(
        aes(ymin = CI_low, ymax = CI_high, col = arm), 
        alpha = 0.4, lineend = "round", linewidth = 1,
        position = position_dodge(width = 0.5)
      ) +
      geom_line(position = position_dodge(width = 0.5), linewidth = 0.8) +
      geom_point(aes(shape = arm), position = position_dodge(width = 0.5), size = 2) +
      facet_grid(site ~ .) + 
      scale_shape_manual("Arm", values = c(1, 19, 15, 17, 8)) +
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\nwith detected LBP strains", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      ) +
      expand_limits(y = 0) +
      scale_color_manual("Arm", values = arm_colors) +
      scale_fill_manual("Arm", values = arm_colors)
    
  }

Alternative visualization:

Code
g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized_v2(total_rel_ab_of_LBP)
g_prop_colonized_by +  ggtitle(fig_title) 

Code
g_prop_colonized_by <- 
  plot_longitudinal_proportions_of_participants_who_colonized_v2(
    total_rel_ab_of_LBP |> filter(!((arm %in% c("LC-106-3", "LC-106-o")) & (site == "MGH")))
    )

g_prop_colonized_by +  ggtitle(fig_title) 

Kinetics of L. crispatus colonization

Total relative abundances of L. crispatus

Code
total_rel_ab_of_Lc <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(str_detect(species, "crispatus"), !poor_quality_mg_data) |> 
  left_join(
    mae |> colData() |> as_tibble(), 
    by = join_by(uid)
  ) |>
  filter(randomized, mITT) |> 
  group_by(.sample, uid) |>
  summarize(
    total_Lc_rel_ab = sum(rel_ab, na.rm = TRUE),
    .groups = "drop"
  )

total_rel_ab_of_Lc <- 
  total_rel_ab_of_Lc |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid))


# we add a pid_nb 
total_rel_ab_of_Lc <- 
  total_rel_ab_of_Lc |> 
  left_join(
    total_rel_ab_of_Lc |> 
      select(pid, arm, site) |> 
      distinct() |> 
      group_by(arm, site) |> 
      arrange(pid) |> 
      mutate(pid_nb = row_number()) |> 
      ungroup()
  )
Code
g_tot_prop_Lc <- 
  total_rel_ab_of_Lc |>
  filter(visit_type == "Clinic", !is.na(arm), PIPV) |> 
  ggplot(aes(x = study_week, y = total_Lc_rel_ab, col = arm)) +
  geom_line(aes(group = pid), alpha = 0.5) +
  geom_point(size = 0.5, alpha = 0.5) +
  facet_grid(site ~ arm) + 
  scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
  scale_y_continuous(
    "Total relative abundances\nof L. crispatus", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  expand_limits(y = 0) +
  scale_color_manual(values = arm_colors) +
  guides(col = "none") +
  ggtitle(mae_title(mae)) 
Code
g_tot_prop_Lc

Code
g_tot_prop_Lc +
  geom_label(aes(label = pid_nb), size = 2) +
  labs(
    caption = "Participants are labeled by their enrollment order within each arm and site."
  )

Proportions of participants who colonized with 50% L. crispatus in each arm and each visit

Code
get_longitudinal_proportions_of_participants_who_colonized <- 
  function(total_rel_ab_of_Lc) {
    total_rel_ab_of_Lc |> 
      filter(!is.na(arm), PIPV) |> 
      group_by(site, arm, study_week) |>
      summarize(
        prop_who_colonized_at = mean(total_Lc_rel_ab > 0.5), 
        CI_low = 
          prop.test(
            x = sum(total_Lc_rel_ab > 0.5), n = n(), 
            conf.level = 0.95)$conf.int[1],
        CI_high = 
          prop.test(
            x = sum(total_Lc_rel_ab > 0.5), n = n(), 
            conf.level = 0.95)$conf.int[2],
        .groups = "drop"
      )
  }
Code
plot_longitudinal_proportions_of_participants_who_colonized <- function(total_rel_ab_of_Lc){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_Lc) |>
      ggplot() +
      aes(x = study_week, y = prop_who_colonized_at, col = arm) +
      facet_grid(site ~ .) +
      geom_ribbon(
        aes(ymin = CI_low, ymax = CI_high, fill = arm), 
        alpha = 0.1, color = NA
      ) +
      geom_line() +
      geom_point() + 
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\n≥ 50% L. crispatus", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      )  +
      expand_limits(y = 0) +
      scale_color_manual("Arm", values = arm_colors) +
      scale_fill_manual("Arm", values = arm_colors) 
  }
Code
g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_Lc)
g_prop_colonized_by + ggtitle(fig_title)

Code
g_prop_colonized_by <- 
  plot_longitudinal_proportions_of_participants_who_colonized(
    total_rel_ab_of_Lc |> filter(!((arm %in% c("LC-106-3", "LC-106-o")) & (site == "MGH")))
    )

g_prop_colonized_by + ggtitle(fig_title)

Alternative visualization:

Code
plot_longitudinal_proportions_of_participants_who_colonized_v2 <- 
  function(total_rel_ab_of_Lc){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_Lc) |>
      ggplot(aes(x = study_week, y = prop_who_colonized_at, col = arm)) +
      geom_linerange(
        aes(ymin = CI_low, ymax = CI_high, col = arm), 
        alpha = 0.4, lineend = "round", linewidth = 1,
        position = position_dodge(width = 0.5)
      ) +
      geom_line(position = position_dodge(width = 0.5), linewidth = 0.8) +
      geom_point(aes(shape = arm), position = position_dodge(width = 0.5), size = 2) +
      facet_grid(site ~ .) + 
      scale_shape_manual("Arm", values = c(1, 19, 15, 17, 8)) +
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\n≥ 50% L. crispatus", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      ) +
      expand_limits(y = 0) +
      scale_color_manual("Arm", values = arm_colors) +
      scale_fill_manual("Arm", values = arm_colors)
    
  }
Code
g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized_v2(total_rel_ab_of_Lc)
g_prop_colonized_by + ggtitle(fig_title)

Code
g_prop_colonized_by <- 
  plot_longitudinal_proportions_of_participants_who_colonized_v2(
    total_rel_ab_of_Lc |> filter(!((arm %in% c("LC-106-3", "LC-106-o")) & (site == "MGH")))
    )

g_prop_colonized_by + ggtitle(fig_title)

Durability of colonization

TODO: add/do figure 3D (include CIs)

= proportion of participants who colonized by week 5 that still have colonization by week 12.

Microbiota trajectories

Longitudinal stack relative abundances plot:

Code
rel_abs <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  select(
    .feature, .sample, uid, rel_ab, poor_quality_mg_data,
    taxon_label, genus, LBP, strain_origin
    ) 
Code
rel_abs <- 
  rel_abs |> 
  group_by(.feature) |> 
  mutate(max_rel_ab = max(rel_ab, na.rm = TRUE), mean_rel_ab = mean(rel_ab, na.rm = TRUE)) |> 
  ungroup() |> 
  arrange(is.na(LBP), -mean_rel_ab) |> 
  mutate(.feature = .feature |> fct_inorder()) |> 
  mutate(
    taxon = 
      case_when(
        as.numeric(.feature) <= 29 ~ taxon_label,
        (genus == "Lactobacillus") ~ "Other Lactobacillus",
        TRUE ~ "Other non-Lacto."
      )
  ) |> 
  arrange(is.na(LBP), LBP, genus != "Lactobacillus", -mean_rel_ab) |> 
  mutate(taxon = taxon |> fct_inorder() |> fct_relevel("Other Lactobacillus", "Other non-Lacto.", "Other", after = Inf))
Code
rel_abs_agg <- 
  rel_abs |> 
  group_by(uid, taxon, LBP, strain_origin, poor_quality_mg_data) |> 
  summarize(rel_ab = sum(rel_ab), .groups = "drop") |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid)) 
Code
g_trajectories <-
  rel_abs_agg |>
  filter(mITT, randomized != 0, PIPV, !poor_quality_mg_data) |>
  group_by(pid) |> 
  mutate(
    tot_LC115 = sum(rel_ab[LBP == "LC-115"]),
    tot_LBP = sum(rel_ab[!is.na(LBP)])
    ) |> 
  ungroup() |> 
  arrange(desc(PP), -tot_LC115, -tot_LBP) |> 
  mutate(
    pid = pid |> fct_inorder(),
    visit_label = ifelse(visit != "screening", str_c(visit, " (W", study_week, ")"), visit |> as.character()) |> fct_reorder(visit |> as.numeric()),
    ) |> 
  ggplot(aes(y = rel_ab, x = pid, fill = taxon, alpha = ifelse(PP, "PP", "mITT"))) +
  geom_col() +
  facet_grid(visit_label ~ arm + site, scales = "free", space = "free") +
  scale_fill_manual(
    values = get_taxa_colors(rel_abs_agg$taxon |> levels()),
    labels = get_taxa_labels(rel_abs_agg$taxon |> levels()),
    guide = guide_legend(ncol = 1)
    ) +
  scale_alpha_discrete("Population", range = c(0.5, 1)) +
  scale_y_continuous(
    "relative abundance", labels = scales::percent,
    breaks = seq(0, 0.5, by = 0.5), minor_breaks = seq(0, 1, by = 0.25)
  ) +
  scale_x_discrete("Participants") +
  theme(
    strip.text.y = element_text(angle = 0, hjust = 0),
    strip.text.x = element_text(angle = 90, hjust = 0),
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  )
Code
g_trajectories

Microbiota trajectories of mITT participants in the VIBRANT study. The relative abundances of the LBP strains and top taxa are shown. Participants are ordered by their total relative abundance of LBP strains, and the study weeks are shown in the rows. The alpha level (transparency) indicates whether the participant is part of the per-protocol (PP) or modified intention-to-treat (mITT) population.
Code
# checking the data for the placebo participant that has the LBP strains at MGH

mae[["mg"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", study_week == 5) 


mae[["qPCR"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", study_week == 5) 

mae[["amplicon"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", study_week == 5) |> 
  select(.feature, .sample, rel_ab, exclude_sample)


mae[["amplicon"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", .feature == "Lactobacillus_crispatus") |> 
  select(.feature, .sample, visit_code, rel_ab, exclude_sample)

Individual LBP strains by sites

Code
LBP_strains <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP)) |> 
  select(.feature, .sample, uid, rel_ab, LBP, strain_origin, poor_quality_mg_data) |> 
  left_join(mae |> colData() |> as_tibble()) |> 
  filter(mITT, !poor_quality_mg_data, randomized)
Code
LBP_strains |> 
   mutate(
    visit_label = ifelse(visit != "screening", str_c(visit, " (W", study_week, ")"), visit |> as.character()) |> fct_reorder(visit |> as.numeric()),
    ) |> 
  filter(!is.na(arm), !is.na(study_week), study_week >= 2) |> 
  ggplot() +
  aes(x = .feature, y = rel_ab, col = site, fill = site) +
  # geom_point(alpha = 0.5, size = 0.5, position = position_dodge(width = 0.5)) +
  # geom_violin(col = NA, alpha = 0.4) + # > TODO: check if it looks good - IT DOES NOT!
  ggbeeswarm::geom_quasirandom(alpha = 0.5, size = 0.5, dodge.width = 0.5) +
  # geom_boxplot(outlier.size = 0.5) +
  facet_grid(visit_label ~ LBP + strain_origin , scales = "free_x", space = "free_x") +
  xlab("LBP strain") +
  scale_y_continuous(
    "Relative abundance", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  scale_color_manual("Study\nsite", values = site_colors) +
  scale_fill_manual("Study\nsite", values = site_colors) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  ) 

Code
LBP_strains |> 
  filter(!is.na(arm), !is.na(study_week), study_week >= 2) |> 
  ggplot() +
  aes(x = .feature, y = rel_ab, col = site, fill = site) +
  # geom_point(alpha = 0.5, size = 0.5) +
  geom_boxplot(outlier.size = 0.5, alpha = 0.4) +
  facet_grid(study_week ~ LBP + strain_origin , scales = "free_x", space = "free_x") +
  xlab("LBP strain") +
  scale_y_continuous(
    "Relative abundance bacteria", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  scale_color_manual(values = site_colors) +
  scale_fill_manual(values = site_colors) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) 

Code
LBP_strains |> 
  filter(!is.na(arm), !is.na(study_week), study_week >= 2) |> 
  ggplot() +
  aes(x = study_week, y = rel_ab, col = strain_origin) +
  geom_point(alpha = 0.5, size = 0.5) +
  geom_line(aes(group = pid), alpha = 0.5) +
  facet_grid(site ~ LBP |> str_replace("&", "&\n") + strain_origin + .feature , scales = "free_x", space = "free_x") +
  xlab("Study week") +
  scale_y_continuous(
    "Relative abundance bacteria", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  #scale_color_manual(values = colorspace::darken(scale_colour_hue()$palette(2), amount = 0.4)) +
  scale_color_manual("Strain origin", values = strain_origin_colors) +
  scale_x_continuous("Study weeks", breaks = c(0:5,7,9,12), minor_breaks = 0:12) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) 

Proportion colonized among exposed

Code
tmp <- 
  # we use the metagenomics data for this
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP), !poor_quality_mg_data) |> 
  select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |> 
  # add the arm, mITT, etc.
  left_join(
    mae |> colData() |> as_tibble() |> select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), 
    by = join_by(uid)
  ) |>
  # only consider the mITT
  filter(randomized, mITT) |> 
  filter(PIPV, as.numeric(visit_code) >= 1200) |> # only include visits after the first 2 weeks
  # define the outcome of interest for each person and strain
  group_by(.feature, LBP, strain_origin, pid, arm, site) |>
  summarize(
    ever_colonized = any(rel_ab >= 0.05, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  # exclude Placebo participants because, they were, in theory, not exposed to any of the strains.
  filter(arm != "Pl") |> 
  # define the total number of participants exposed for each strain.
  # For LC-115 strains, this is only participants of arm LC-115
  # For LC-106 strains, this is participants from all active arms.
  mutate(
    exposed = case_when(
      LBP == "LC-115" ~ (arm == "LC-115"),
      LBP == "LC-106 & LC-115" ~ TRUE,
      TRUE ~ FALSE
    )
  ) |> 
  # we compute the statistics of interest
  group_by(.feature, LBP, strain_origin, site) |> 
  summarize(
    n_exposed = sum(exposed, na.rm = TRUE),
    n_colonized = sum(ever_colonized & exposed, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  mutate(
    prop_colonized = n_colonized / n_exposed,
    CI_low = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$lower,
    CI_high = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$upper
  )
Code
tmp |> 
  ggplot() +
  aes(x = .feature, y = prop_colonized, col = site) +
  facet_grid(. ~ LBP + strain_origin, scales = "free", space = "free") +
  geom_errorbar(
    aes(ymin = CI_low, ymax = CI_high), 
    linewidth = 1, alpha = 0.7,
    width = 0.3, position = position_dodge(width = 0.5)
    ) +
  geom_point(position = position_dodge(width = 0.5), size = 2) +
  scale_color_manual("Study site", values = site_colors) +
  xlab("LBP strain") + 
  ylab("Proportion of participants colonized\namong those exposed") +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Code
tmp |> 
  mutate(
    CI = str_c(
      "[", 
      scales::percent(CI_low, accuracy = 1), 
      " - ", 
      scales::percent(CI_high, accuracy = 1), 
      "]"
    ),
    value = 
      str_c(
        scales::percent(prop_colonized, accuracy = 1), 
        " (", n_colonized, "/", n_exposed,")<br>",
       CI
      ),
    strain_info = str_c(LBP, "<br>(", strain_origin, " strains)")
  ) |> 
  select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |> 
  pivot_wider(
    names_from = c(site), 
    values_from = value
  ) |>
  arrange(strain_info) |> 
  group_by(strain_info) |> 
  gt(
    row_group_as_column = TRUE, 
    process_md = TRUE,
    caption = "Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure."
  ) |> 
  cols_label(.feature = "LBP strain") |> 
  cols_align(columns = .feature, align = "left") |> 
  cols_align(columns = c("CAP", "MGH"), align = "center") |>
  fmt_markdown(columns = everything()) 
Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure.
LBP strain CAP MGH
LC-106 & LC-115
(SA strains)
FF00018 7% (4/56)
[3% - 17%]
20% (3/15)
[7% - 45%]
FF00051 30% (17/56)
[20% - 43%]
67% (10/15)
[42% - 85%]
UC101 25% (14/56)
[16% - 38%]
73% (11/15)
[48% - 89%]
LC-106 & LC-115
(US strains)
C0022A1 25% (14/56)
[16% - 38%]
47% (7/15)
[25% - 70%]
C0059E1 61% (34/56)
[48% - 72%]
87% (13/15)
[62% - 96%]
C0175A1 39% (22/56)
[28% - 52%]
67% (10/15)
[42% - 85%]
LC-115
(SA strains)
122010 57% (8/14)
[33% - 79%]
33% (2/6)
[10% - 70%]
185329 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
FF00004 50% (7/14)
[27% - 73%]
83% (5/6)
[44% - 97%]
FF00064 50% (7/14)
[27% - 73%]
83% (5/6)
[44% - 97%]
FF00072 7% (1/14)
[1% - 31%]
50% (3/6)
[19% - 81%]
UC119 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
LC-115
(US strains)
C0006A1 50% (7/14)
[27% - 73%]
67% (4/6)
[30% - 90%]
C0028A1 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
C0112A1 64% (9/14)
[39% - 84%]
83% (5/6)
[44% - 97%]

Excluding immediate post-product visit

Code
tmp <- 
  # we use the metagenomics data for this
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP), !poor_quality_mg_data) |> 
  select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |> 
  # add the arm, mITT, etc.
  left_join(
    mae |> colData() |> as_tibble() |> 
      select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), 
    by = join_by(uid)
  ) |>
  # only consider the mITT
  filter(randomized, mITT) |> 
  mutate(
    include_visit = 
      case_when(
        (arm %in% c("LC-106-7", "LC-115")) & as.numeric(visit_code) > 1200 ~ TRUE,
        (arm %in% c("LC-106-3", "LC-106-o")) & as.numeric(visit_code) >= 1200 ~TRUE,
        TRUE ~ FALSE
      )
  ) |> 
  filter(PIPV, include_visit) |> # only include visits after the first 2 weeks
  # define the outcome of interest for each person and strain
  group_by(.feature, LBP, strain_origin, pid, arm, site) |>
  summarize(
    ever_colonized = any(rel_ab >= 0.05, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  # exclude Placebo participants because, they were, in theory, not exposed to any of the strains.
  filter(arm != "Pl") |> 
  # define the total number of participants exposed for each strain.
  # For LC-115 strains, this is only participants of arm LC-115
  # For LC-106 strains, this is participants from all active arms.
  mutate(
    exposed = case_when(
      LBP == "LC-115" ~ (arm == "LC-115"),
      LBP == "LC-106 & LC-115" ~ TRUE,
      TRUE ~ FALSE
    )
  ) |> 
  # we compute the statistics of interest
  group_by(.feature, LBP, strain_origin, site) |> 
  summarize(
    n_exposed = sum(exposed, na.rm = TRUE),
    n_colonized = sum(ever_colonized & exposed, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  mutate(
    prop_colonized = n_colonized / n_exposed,
    CI_low = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$lower,
    CI_high = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$upper
  )
Code
tmp |> 
  ggplot() +
  aes(x = .feature, y = prop_colonized, col = site) +
  facet_grid(. ~ LBP + strain_origin, scales = "free", space = "free") +
  geom_errorbar(
    aes(ymin = CI_low, ymax = CI_high), 
    linewidth = 1, alpha = 0.7,
    width = 0.3, position = position_dodge(width = 0.5)
    ) +
  geom_point(position = position_dodge(width = 0.5), size = 2) +
  scale_color_manual("Study site", values = site_colors) +
  xlab("LBP strain") + 
  ylab("Proportion of participants colonized\namong those exposed") +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Code
tmp |> 
  mutate(
    CI = str_c(
      "[", 
      scales::percent(CI_low, accuracy = 1), 
      " - ", 
      scales::percent(CI_high, accuracy = 1), 
      "]"
    ),
    value = 
      str_c(
        scales::percent(prop_colonized, accuracy = 1), 
        " (", n_colonized, "/", n_exposed,")<br>",
       CI
      ),
    strain_info = str_c(LBP, "<br>(", strain_origin, " strains)")
  ) |> 
  select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |> 
  pivot_wider(
    names_from = c(site), 
    values_from = value
  ) |>
  arrange(strain_info) |> 
  group_by(strain_info) |> 
  gt(
    row_group_as_column = TRUE, 
    process_md = TRUE,
    caption = "Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC-106-7 and LC-115 arms)."
  ) |> 
  cols_label(.feature = "LBP strain") |> 
  cols_align(columns = .feature, align = "left") |> 
  cols_align(columns = c("CAP", "MGH"), align = "center") |>
  fmt_markdown(columns = everything()) 
Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC-106-7 and LC-115 arms).
LBP strain CAP MGH
LC-106 & LC-115
(SA strains)
FF00018 2% (1/56)
[0% - 9%]
7% (1/15)
[1% - 30%]
FF00051 9% (5/56)
[4% - 19%]
20% (3/15)
[7% - 45%]
UC101 2% (1/56)
[0% - 9%]
7% (1/15)
[1% - 30%]
LC-106 & LC-115
(US strains)
C0022A1 21% (12/56)
[13% - 34%]
33% (5/15)
[15% - 58%]
C0059E1 43% (24/56)
[31% - 56%]
40% (6/15)
[20% - 64%]
C0175A1 39% (22/56)
[28% - 52%]
67% (10/15)
[42% - 85%]
LC-115
(SA strains)
122010 36% (5/14)
[16% - 61%]
17% (1/6)
[3% - 56%]
185329 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
FF00004 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
FF00064 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
FF00072 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
UC119 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
LC-115
(US strains)
C0006A1 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
C0028A1 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
C0112A1 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]

Supplementary analyses

Long terme colonization

“Among people with detection of the LBP strains during the first 5 weeks, approximately 40-60% still had detection at week 12 (Figure 3b), demonstrating prolonged durability despite a relatively short initial dosing.”

Among participants colonized by week 5, how many are still colonized at week 12?

Code
# existe déjà 

col_by_week5 <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) 


col_at_week12 <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "2120", .feature == "colonized_LBP_at_mg")
  ) |> 
  mutate(LBP_colonization_at_week12 = outcome |> replace_na(FALSE)) 


col_still_col <- 
  col_by_week5 |> 
  select(pid, site, arm, randomized, ITT, mITT, PP, LBP_colonization_by_week5) |> 
  left_join(
    col_at_week12 |> select(pid, LBP_colonization_at_week12 ),
    by = c("pid")
  )

Table of colonization at week 12 among participants colonized by week 5 (by site and arm)

Code
summary_table_still_col <-
  col_still_col |> 
  filter(LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  summarize(
    n = n(),  # total with colonization at week 5
    n_success = sum(LBP_colonization_at_week12),  # still colonized at week 12
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "exact")
  )

table_still_col <- 
  summary_table_still_col |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants colonized by week 5",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected at week 12",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )


table_still_col |> 
  group_by(site) |> 
  gt(caption = "Colonization at week 12 among participant colonized by week 5 by arm and site", row_group_as_column = TRUE) |> 
  cols_width(
    "name" ~ px(200),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    # Blinded = "All blinded arms",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )
Colonization at week 12 among participant colonized by week 5 by arm and site
LC-106
7 days
LC-106
3 days
LC-106
overlap
LC-115
7 days
Placebo
CAP N participants colonized by week 5 N = 9 N = 7 N = 7 N = 10
n (%) participants with LBP strain detected at week 12 6 (67 %) 3 (43 %) 4 (57 %) 4 (40 %)
95% CI 30% - 93% 10% - 82% 18% - 90% 12% - 74%
MGH N participants colonized by week 5 N = 6 N = 1 N = 1 N = 6 N = 1
n (%) participants with LBP strain detected at week 12 3 (50 %) 1 (100 %) 1 (100 %) 1 (17 %) 0 (0 %)
95% CI 12% - 88% 3% - 100% 3% - 100% 0% - 64% 0% - 98%

Overall colonization at week 12 among participants colonized by week 5

Code
col_still_col |> 
  filter(LBP_colonization_by_week5) |> 
  summarise(
    n = n(),
    n_success = sum(LBP_colonization_at_week12),
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "exact")
  ) |> 
  mutate(
    pct = round(p * 100, 1),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(n, n_success, pct, CI_text) |> 
  gt() |> 
  cols_label(
    n = "N participants \n colonized by week 5",
    n_success = "n (%) participants still colonized at week 12",
    pct = "Percentage",
    CI_text = "95% CI"
  )
N participants colonized by week 5 n (%) participants still colonized at week 12 Percentage 95% CI
48 23 47.9 (33.3%–62.8%)

Applicator staining

Load data

We load data on stained applicators from CRF46 and data on returned applicators from CRF23. We also load the exposures table to obtain information on study product use. Then, we create the applicator table, which includes only participants from the mITT population.

Code
applicator_stained <- 
  metadata(mae)$crf_data_clean$crf46 |> 
  as_tibble() |> 
  select(-dfseq, -vdate_fixed)

applicator_returned <- 
  metadata(mae)$crf_data_clean$crf23 |> 
  as_tibble() |> 
  select(uid, pid, visit_code, study_day, used_applicators, number_applicators) |> 
  group_by(pid)

study_product <- 
  mae@colData |> 
  as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    exposures,  
    by = "pid"
  ) |>
  group_by(pid) |>
  summarise(
    n_product_doses = sum(study_product |> as.integer(), na.rm = TRUE)
  )

applicator <-
  mae@colData |> as_tibble() |>
  select(pid, site, randomized, arm, ITT, mITT, PP) |>
  distinct() |>
  filter(randomized, mITT)|>
  left_join(applicator_returned, by = c("pid")) |>
  left_join(
    applicator_stained,
    by = c("uid", "pid", "visit_code", "study_day")
  ) |>
  left_join(study_product, by = "pid")

Some participants returned applicators at two different visits. According to the CAPRISA Vulindlela Pharmacy, used and remaining applicators, along with the study product, were given back to participants after the first visit, and they were expected to return all applicators at the second visit. For this reason, when applicators were returned at both visits, we kept only the data from the second visit.

Two participants returned eight applicators. In both cases, comments explain that they had received a spare applicator:

  • Participant 068-20-0425 reported taking the spare applicator out of its wrapping and placing it in the return bag without using or washing any applicators.
  • Participant 068-20-0439 reported inserting the spare applicator on the day of the visit, thinking all applicators had to be used, but also did not wash any of them.
Code
applicator |> 
  ggplot(aes(x = visit_code, y = pid, fill = number_applicators |> as.factor())) +
  geom_tile() + 
  geom_text(aes(label = applicator_stain_positive), color = "black", size = 2) +   
  scale_fill_manual( 
    name = "Number of applicators \nreturned",
    values = c(
    "3" = "lightblue3", 
    "4" = "lightblue", 
    "5" = "lightblue1", 
    "6" = "lightgreen", 
    "7" = "green3", 
    "8" = "darkgreen"
    ), 
    na.translate = FALSE) +
  labs(
    x = "Visit code",
    y = "Participant ID",
    title = "Number of applicator returned (out of number applicator stained positive)") + 
  theme(axis.text.y = element_text(size=4))

Code
applicator <- 
  applicator |>
  group_by(pid) |>
  mutate(
    data_complete = !is.na(number_applicators) | !is.na(applicator_stain_positive)
  ) |>
  arrange(pid, desc(data_complete), desc(visit_code)) |>
  dplyr::slice(1) |> 
  ungroup()

Numbers

Done for mITT, is it ok ?

  • Proportion of participants that returned applicators (0.933)
  • Proportion of participants with returned applicators for which the staining analysis was done (0.988)
Code
applicator |> 
  group_by(pid) |> 
  summarise(
    used_applicators_returned = ifelse(any(used_applicators == "Yes", na.rm = TRUE), "Yes", "No"),
    applicator_stained = ifelse(any(!is.na(total_applicators_stained)), "Yes", "No"),
    .groups = "drop"
  )|> 
  dplyr::count(used_applicators_returned, applicator_stained) |> 
  mutate(
    prop = round(n / sum(n), 3)
  ) |> 
  gt() |> 
  cols_label(
    used_applicators_returned = "Used applicators returned",
    applicator_stained = "Applicator stained",
    n = "N participants",
    prop = "Percentage"
  )
Used applicators returned Applicator stained N participants Percentage
No No 6 0.067
Yes No 1 0.011
Yes Yes 83 0.922
Code
applicator |> 
  dplyr::count(used_applicators) |> 
  mutate(prop = round(n / sum(n), 3)) |> 
  gt()
Bring used applicators n prop
No 6 0.067
Yes 84 0.933
Code
applicator |> 
  filter(used_applicators == "Yes") |> 
  dplyr::count(total_applicators_stained == 0|is.na(total_applicators_stained)) |> 
  mutate(prop = round(n / sum(n), 3)) |> 
  gt()
total_applicators_stained == 0 | is.na(total_applicators_stained) n prop
FALSE 83 0.988
TRUE 1 0.012

Staining analysis

mITT

Concordance between self-reports and applicator staining:

  • scatter plot of # of self-reported doses vs # of positive staining
Code
applicator |> 
  mutate(
    applicator_stain_positive = 
      applicator_stain_positive |> as.factor() |> fct_expand("Missing") |> replace_na("Missing")
  ) |>
  dplyr::count(n_product_doses, applicator_stain_positive) |>
  ggplot(aes(x=n_product_doses , 
             y=applicator_stain_positive, 
             fill = n)) + 
  geom_tile() +
  geom_text(aes(label = n), size = 3) +
  scale_fill_gradient(low = "lightblue1", high = "lightblue4") +
  guides(fill = "none") +
  xlab("Number of doses taken") +
  ylab("Positive applicator (n)")

Code
applicator |> 
  mutate(
    applicator_stain_positive = 
      applicator_stain_positive |> as.factor() |> fct_expand("Missing") |> replace_na("Missing")
  ) |>
  ggplot(aes(x = n_product_doses, y = applicator_stain_positive, col = n_product_doses |> as.factor())) + 
  geom_point() + 
  geom_jitter(width = 0.2, height = 0.2) + 
  labs(x = "Number of study product doses", 
       y = "Number of applicator stained positive") + 
  guides(col = "none")

  • Number of participants that self-reported using at least one dose (= XX) (90)
  • Number of participants that had at least one positive staining (= YY) (82)
Code
applicator |> 
  dplyr::count(n_product_doses) |> 
  mutate(prop = round(n / sum(n), 3)) |> 
  gt() |> 
  cols_label(
    n_product_doses = "Number of study product doses taken"
  )
Number of study product doses taken n prop
3 1 0.011
4 1 0.011
6 4 0.044
7 84 0.933
Code
applicator |> 
  dplyr::count(applicator_stain_positive > 1) |> 
  mutate(prop = round(n / sum(n), 3)) |> 
  gt() |> 
  cols_label(
    `applicator_stain_positive > 1` = "More than 1 applicator stained positive"
  )
More than 1 applicator stained positive n prop
FALSE 1 0.011
TRUE 82 0.911
NA 7 0.078
  • contingency table of self-reported doses and positive staining (ok)

  • Proportion of participants self-reported using at least one dose and have at least one positive staining (= ZZ/XX) (=82)

Code
applicator |> 
  dplyr::count((applicator_stain_positive > 1&n_product_doses>1)) |> 
  mutate(prop = round(n / sum(n), 3)) |> 
  gt() |> 
  cols_label(
    `(applicator_stain_positive > 1 & n_product_doses > 1)` = "More than 1 applicator stained positive and more than 1 product doses taken"
  )
More than 1 applicator stained positive and more than 1 product doses taken n prop
FALSE 1 0.011
TRUE 82 0.911
NA 7 0.078
PP

-> est ce que je dois filtrer PP avant ? Les resultats ne sont pas les mêmes

Same but with at least 6 doses

Concordance between self-reports and applicator staining:

  • scatter plot of # of self-reported doses vs # of positive staining
Code
applicator |> 
filter(PP) |> 
  mutate(
    applicator_stain_positive = 
      applicator_stain_positive |> as.factor() |> fct_expand("Missing") |> replace_na("Missing"),
    n_product_doses = n_product_doses |> as.factor()
  ) |>
  dplyr::count(n_product_doses, applicator_stain_positive) |>
  ggplot(aes(x=n_product_doses , 
             y=applicator_stain_positive, 
             fill = n)) + 
  geom_tile() +
  geom_text(aes(label = n), size = 3) +
  scale_fill_gradient(low = "lightblue1", high = "lightblue4") +
  guides(fill = "none") +
  xlab("Number of doses taken") +
  ylab("Positive applicator (n)")

Code
applicator |>
  filter(PP) |>
  mutate(
    applicator_stain_positive =
      applicator_stain_positive |> as.factor() |> fct_expand("Missing") |>     replace_na("Missing")
  ) |>
  ggplot(
    aes(
      x = n_product_doses |> as.factor(),
      y = applicator_stain_positive,
      col = n_product_doses |> as.factor()
  )
  ) +
  geom_point() +
  geom_jitter(width = 0.2, height = 0.2) + 
  labs(x = "Number of study product doses", 
       y = "Number of applicator stained positive") + 
  guides(col = "none")

  • Number of participants that self-reported using at least six dose (= XX) (85)
  • Number of participants that had at least six positive staining (= YY) (61)
Code
applicator |> 
  filter(PP) |> 
  dplyr::count(n_product_doses) |> 
  mutate(prop = round(n / sum(n), 3)) |> 
  gt() |> 
  cols_label(
    n_product_doses = "Number of study product taken"
  )
Number of study product taken n prop
6 4 0.047
7 81 0.953
Code
applicator |> 
  filter(PP) |> 
  dplyr::count(applicator_stain_positive > 6) |> 
  mutate(prop = round(n / sum(n), 3)) |> 
  gt() |> 
  cols_label(
    `applicator_stain_positive > 6` = "More than 6 applicator stained positive"
  )
More than 6 applicator stained positive n prop
FALSE 20 0.235
TRUE 61 0.718
NA 4 0.047
  • contingency table of self-reported doses and positive staining (ok)

  • Proportion of participants self-reported using at least six dose and have at least six positive staining (= ZZ/XX) (=59)

Code
applicator |>
  filter(PP) |> 
  dplyr::count((applicator_stain_positive > 6&n_product_doses>6)) |> 
  mutate(prop = round(n / sum(n), 3)) |> 
  gt() 
(applicator_stain_positive > 6 & n_product_doses > 6) n prop
FALSE 22 0.259
TRUE 59 0.694
NA 4 0.047

Time since last dose in 7-days active arms (LC-106-7, LC-115)

We compute this time based on self-reported time of last study product use (daily - CRF plate 32/33) -> date/time of last study product = use the crf 32/33 visit_day + answer to when? (early morning, etc.) -> swab time = use study_day of the weekly visit.

To compute the time between the last dose taken prior to the in-person visit at week 2, we use information from three sources: crf32 for the time of insertion of the study product, the exposure table for the date of the last dose, and the visits_crfs_merged table for the study day of the week 2 in-person visit.

To better estimate this time interval, we combine the study_day (which reflects the calendar day of product use) with insertion_study_product_time to account for the time of day the product was inserted. Specifically, we convert the reported insertion window into a fractional day value and add it to the study day, as follows:

  • Before 6am: 0.125
  • Early morning (6–9am): 0.25
  • Late morning (9–12pm): 0.4375
  • Early afternoon (12–3pm): 0.5625
  • Late afternoon (3–6pm): 0.6875
  • Evening (6–9pm): 0.8125
  • Late evening (9pm–midnight): 0.9375

For the week 2 in-person visit, we also assume it occurred during the day (rather than at midnight), so we add 0.5 to the visit’s study_day to reflect a likely early afternoon visit time.

Code
study_product <-
  mae@colData |>
  as_tibble() |>
  select(pid, site, randomized, arm, ITT, mITT, PP) |>
  distinct() |>
  filter(randomized, mITT) |>
  left_join(
    exposures |> select(pid, visit_code, study_product),
  by = "pid"
  ) |>
  left_join(
    metadata(mae)$crf_data_clean$crf32 |>as_tibble(),
    by = (c("pid", "visit_code"))) |>
  dplyr::rename(
    study_day_product = study_day,
    visit_code_product = visit_code
  ) |>
  mutate(study_day_product_frac =
           dplyr::case_when(
             insert_study_product_time == "Before 6am" ~ study_day_product + 0.125,
             insert_study_product_time == "Early morning (6am-9am)" ~ study_day_product + 0.3125,
             insert_study_product_time == "Late morning (9am-12pm)" ~ study_day_product + 0.4375,
             insert_study_product_time == "Early afternoon (12pm-3pm)" ~ study_day_product + 0.5625,
             insert_study_product_time == "Late afternoon (3pm-6pm)" ~ study_day_product + 0.6875,
             insert_study_product_time == "Evening (6pm-9pm)" ~ study_day_product + 0.8125,
             insert_study_product_time == "Late evening (9pm-midnight)" ~ study_day_product + 0.9375,
             TRUE ~ study_day_product
           ))


# study_day visit 2 in person
# ! pid 068100026 appear 2 times ! -> remove using filter(!is.na(study_day_visit2))
visit2_studyday <- 
  metadata(mae)$visits_crfs_merged |> 
  as_tibble() |> 
  select(pid, visit_code, study_day) |> 
  distinct() |> 
  filter(visit_code == "1200") |> 
  dplyr::rename(
    study_day_v2 = study_day
  ) |> 
  dplyr::filter(!is.na(study_day_v2)) |> 
  mutate(
    study_day_v2 = study_day_v2 + 0.5
  )

study_prod_time <- 
  study_product |> 
  left_join(visit2_studyday |> select(-visit_code), by = "pid") |> 
  group_by(pid) |> 
  filter(study_product =="1") |> 
  filter(study_day_product_frac <= study_day_v2) |> 
  arrange(study_day_product_frac) |> 
  slice_tail(n = 1) |>              
  ungroup() 


# filter for participant in LC-106-7 and LC-115 and compute the difference in time
study_prod_time <- 
  study_prod_time |> 
  filter(arm %in% c("LC-106-7", "LC-115")) |> 
  mutate(
    time_between_final_dose_and_v2 = study_day_v2 - study_day_product_frac
  )
Code
# mean and CI of time_between_final_dose_and_visit2 by arm
study_prod_time |> 
  group_by(arm) |> 
  summarise(
    mean_time = mean(time_between_final_dose_and_v2, na.rm = TRUE),
    median_time = median(time_between_final_dose_and_v2, na.rm = TRUE),
    sd_time = sd(time_between_final_dose_and_v2, na.rm = TRUE),
    n = n(),
    CI = list(DescTools::MeanCI(time_between_final_dose_and_v2, conf.level = 0.95, na.rm = TRUE))
  ) |> 
  mutate(
    CI_text = str_c("(", round(map_dbl(CI, ~ .x[2]), 2), "-", round(map_dbl(CI, ~ .x[3]), 2), ")")
  ) |>
  select(arm,n, median_time, mean_time, sd_time, CI_text)
# A tibble: 2 × 6
  arm          n median_time mean_time sd_time CI_text    
  <fct>    <int>       <dbl>     <dbl>   <dbl> <chr>      
1 LC-106-7    21        1.19      1.36   1.05  (0.88-1.84)
2 LC-115      20        1.19      1.17   0.495 (0.94-1.4) 
Code
study_prod_time |> 
  ggplot() + 
  aes(x = time_between_final_dose_and_v2*24, y = pid) +
  geom_point() + 
  geom_vline(xintercept = c(24, 48, 72, 96), linetype = "dashed", color = "steelblue") +
  scale_x_continuous(
    breaks = seq(0, 24*4, by = 12),  
    name = "Time between final dose and visit 2 (hours)"
  ) +
  facet_grid(site~arm, scales = "free_y", space = "free_y")

LBP colonization in participants with BV post-MTZ

see “After oral metronidazole treatment, 22/70 (30%) people still had a Nugent score of 7 or higher, indicating a failure to resolve BV. Among the 16 of these people exposed to LBP, ZZ had a detectable LBP strain identified after treatment by metagenomics, and TT by qPCR during treatment.”

The Nugent score is available in CRF14. We are interested in the Nugent score value at the visit following metronidazole treatment (visit 1100).

We first create a table containing the Nugent score for all participants at all visits. We then create a variable nugent_positive, which is TRUE if nugent_total_score >= 7 and FALSE otherwise. We also create a variable active_arm to group participant by active arm (i.e. LC106-o, LC-106-7, LC-106-3, LC-115) vs Placebo

Code
# nugent score for all visits 
nugent_score <- 
  metadata(mae)$crf_data_clean$crf14 |> 
  as_tibble() |> 
  select(uid, pid, visit_code, study_day, nugent_total_score) |> 
  left_join(
    mae@colData |> as_tibble() |> 
    select(pid, site, randomized, arm, ITT, mITT, PP) |> 
    distinct()) |>
  filter(randomized, mITT) |> 
  group_by(pid) |>
  mutate(
    active_arm = ifelse(arm!="Pl", TRUE, FALSE), 
    nugent_positive = ifelse(nugent_total_score >= 7, TRUE, FALSE)
  ) |> 
  ungroup()
Code
nugent_score |> 
  filter(visit_code == "1100") |> 
  ggplot()+ 
  aes(x = nugent_total_score |> as.factor(), y = pid, fill = nugent_positive |> as.factor()) +
  geom_tile() +
  scale_fill_manual(
    name = "Nugent score >= 7",
    values = c("TRUE" = "lightblue", "FALSE" = "lightgray"),
    na.translate = FALSE
  ) + 
  labs(x = "Nugent score", y = "Participant ID", title = "Nugent score at visit 1100") + 
  theme(axis.text.y = element_text(size = 4))

Code
nugent_score |> 
  ggplot()+ 
  aes(x = visit_code, y = pid, fill = nugent_positive |> as.factor()) +
  geom_tile() +
  scale_fill_manual(
    name = "Nugent score positive (>= 7)",
    values = c("TRUE" = "lightblue", "FALSE" = "lightgray"),
    na.translate = FALSE
  ) + 
  labs(x = "Visit Code", y = "Participant ID") + 
  theme(axis.text.y = element_text(size = 4))

At visit 1100, among participants with BV who were exposed to the LBP (i.e., in the active arm), 5 participants had a detectable LBP strain identified after treatment by metagenomics.

Code
# among the 12 of these people exposed to LBP

nugent_score |>
  filter(visit_code == "1100") |> 
  filter(active_arm) |>
  filter(nugent_positive) |>
  left_join(
    mae[["mg"]] |> as_tibble() |>  # add mg data
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |>
      filter(visit_code == "1200"),
    by = c("pid")
  ) |>
  filter(!is.na(LBP)) |> #only LBP strain
  group_by(pid) |>
  summarise(prop_tot_LBP = sum(rel_ab)) |>
  arrange(prop_tot_LBP |> desc())
# A tibble: 11 × 2
   pid       prop_tot_LBP
   <chr>            <dbl>
 1 068200204       0.961 
 2 068200422       0.733 
 3 068200428       0.157 
 4 068200214       0.133 
 5 068200460       0.0643
 6 068200028       0     
 7 068200061       0     
 8 068200127       0     
 9 068200229       0     
10 068200378       0     
11 068200387       0     
Code
  #filter(prop_tot_LBP > 0) 
Code
#nugent score of active arm at visit 1100 
nugent_score |> 
  filter(visit_code == "1100") |> 
  group_by(arm) |>
  summarise(
    n = n(),
    n_positive = sum(nugent_positive, na.rm = TRUE),
    p = n_positive / n,
    CI = binom::binom.confint(n_positive, n, method = "exact")
  ) |>
  mutate(
    pct = round(p * 100, 1),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(-CI, -p) |> 
  gt() |> 
  tab_header(
    title = "Proportion of Participants with Nugent Score ≥ 7 at Visit 1100 (week1)"
  ) |>
  cols_label(
    n = "N participants",
    n_positive = "n (%) participants with nugent score >= 7",
    pct = "Percentage",
    CI_text = "95% CI"
  )
Proportion of Participants with Nugent Score ≥ 7 at Visit 1100 (week1)
arm N participants n (%) participants with nugent score >= 7 Percentage 95% CI
Pl 19 6 31.6 (12.6%–56.6%)
LC-106-7 21 3 14.3 (3%–36.3%)
LC-106-3 15 6 40.0 (16.3%–67.7%)
LC-106-o 15 1 6.7 (0.2%–31.9%)
LC-115 20 2 10.0 (1.2%–31.7%)

See “If we combine all active arms vs. placebo, at 5 weeks recurrence was 33% vs. 67% (p = 0.05) and at 12 weeks 44% vs. 70% (p = 0.13). Participants who achieved the primary outcome of LBP detection were less likely to have BV recurrence during the 12 week follow up: YY/YY vs. YY/YY who did not colonize (p = YY).

Code
nugent_score |> 
  filter(visit_code == "1500") |> 
  group_by(active_arm) |> 
  summarise(
    n = n(),
    n_positive = sum(nugent_positive, na.rm = TRUE),
    p = n_positive / n,
    CI = binom::binom.confint(n_positive, n, method = "exact")
  ) |>
  mutate(
    pct = round(p * 100, 1),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(-CI, -p) |> 
  gt() |> 
  tab_header(
    title = "Proportion of Participants with Nugent Score ≥ 7 at Visit 1500 (week 5)"
  ) |> 
  cols_label(
    n = "N participants",
    n_positive = "n (%) participants with nugent score >= 7",
    pct = "Percentage",
    CI_text = "95% CI"
  )
Proportion of Participants with Nugent Score ≥ 7 at Visit 1500 (week 5)
active_arm N participants n (%) participants with nugent score >= 7 Percentage 95% CI
FALSE 17 13 76.5 (50.1%–93.2%)
TRUE 71 25 35.2 (24.2%–47.5%)
Code
  nugent_score |> 
  filter(visit_code == "2120") |> 
  group_by(active_arm) |> 
  summarise(
    n = n(),
    n_positive = sum(nugent_positive, na.rm = TRUE),
    p = n_positive / n,
    CI = binom::binom.confint(n_positive, n, method = "exact")
  ) |>
  mutate(
    pct = round(p * 100, 1),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(-CI, -p) |> 
  gt() |> 
  tab_header(
    title = "Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)"
  ) |> 
  cols_label(
    n = "N participants",
    n_positive = "n (%) participants with nugent score >= 7",
    pct = "Percentage",
    CI_text = "95% CI"
  )
Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)
active_arm N participants n (%) participants with nugent score >= 7 Percentage 95% CI
FALSE 17 12 70.6 (44%–89.7%)
TRUE 69 23 33.3 (22.4%–45.7%)

Participants who achieved the primary outcome of LBP detection were less likely to have BV recurrence during the 12 week follow up: YY/YY vs. YY/YY who did not colonize (p = YY).

Code
nugent_score |> 
  filter(visit_code == "2120") |> 
  left_join(col_by_week5 |> select(pid, LBP_colonization_by_week5), by="pid") |> 
  group_by(LBP_colonization_by_week5) |> 
  summarise(
    n = n(),
    n_positive = sum(nugent_positive, na.rm = TRUE),
    p = n_positive / n,
    CI = binom::binom.confint(n_positive, n, method = "exact")
  ) |>
  mutate(
    pct = round(p * 100, 1),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(-CI, -p) |> 
  gt() |> 
  tab_header(
    title = "Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)"
  ) |> 
  cols_label(
    n = "N participants",
    n_positive = "n (%) participants with nugent score >= 7",
    pct = "Percentage",
    CI_text = "95% CI"
  )
Proportion of Participants with Nugent Score ≥ 7 at Visit 2120 (week12)
LBP_colonization_by_week5 N participants n (%) participants with nugent score >= 7 Percentage 95% CI
FALSE 39 25 64.1 (47.2%–78.8%)
TRUE 47 10 21.3 (10.7%–35.7%)

LBP Colonization dominance

See When the LBP isolates were detected they were most often the dominant species in the community

Qu’est ce qu’on entend par détecté ? Metagenomics ou qPCR ? A un niveau de 5% ? Primary outcome ?

Code
rel_abs <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!poor_quality_mg_data) |> 
  left_join(
    mae |> colData() |> as_tibble(), 
    by = join_by(uid)
  ) |>
  filter(randomized, mITT)

rel_abs |> 
  group_by(.sample) |> 
  summarise(
    detected = any(!is.na(LBP) & rel_ab > 0, na.rm = TRUE),
    crispatus_abundance = sum(rel_ab[species == "crispatus"], na.rm = TRUE), 
    crispatus_dominance = ifelse(crispatus_abundance>0.5, TRUE, FALSE), 
    .groups = "drop"
  ) |> 
  filter(detected) |> 
  dplyr::count(crispatus_dominance, name = "n") |> 
  mutate(
    crispatus_dominance = if_else(crispatus_dominance, "Yes", "No"),
    total = sum(n),
    pct = round(100 * n / total, 1),
    CI = binom::binom.confint(n, total, method = "exact"),
    CI_text = str_c("(", round(CI$lower * 100, 1), "%–", round(CI$upper * 100, 1), "%)")
  ) |> 
  select(crispatus_dominance, n, pct, CI_text) |> 
  gt() |> 
  tab_header(
    title = md("Dominance of *L. crispatus* when LBP strains are detected")
  ) |> 
  cols_label(
    crispatus_dominance = "L. crispatus dominance",
    n = "N samples with LBP detected",
    pct = "Percentage",
    CI_text = "95% CI"
  )
Dominance of L. crispatus when LBP strains are detected
L. crispatus dominance N samples with LBP detected Percentage 95% CI
No 39 17.2 (12.5%–22.7%)
Yes 188 82.8 (77.3%–87.5%)

See Additionally, if a participant established L. crispatus dominance with the LBP strains, they had a significantly lower chance of recurrent BV

At any time ?

Code
tmp <- 
  rel_abs |> 
  group_by(.sample) |> 
  summarise(
    detected = any(!is.na(LBP) & rel_ab > 0, na.rm = TRUE),
    crispatus_abundance = sum(rel_ab[species == "crispatus"], na.rm = TRUE), 
    LBP_abundance = sum(rel_ab[!is.na(LBP)], na.rm = TRUE),
    LBP_dominance = ifelse(LBP_abundance>0.5, TRUE, FALSE), 
    crispatus_dominance = ifelse(crispatus_abundance>0.5, TRUE, FALSE), 
    .groups = "drop"
  ) |> 
  left_join(nugent_score |> 
              select(uid, pid, visit_code, nugent_total_score, nugent_positive), 
            by = c(".sample"="uid")) 

tmp |>  
  filter(!is.na(visit_code)) |> 
  ggplot() + 
  aes(x=visit_code, y = pid, 
      shape = nugent_positive |> as.character(), 
      col = LBP_dominance |> as.character()) + 
  geom_point() + 
  scale_shape_manual(
    name = "Nugent Postive", 
    values = c("TRUE" = 16, "FALSE" = 1)
  ) + 
  scale_color_manual(
    name = "LPB Dominance", 
    values = c("TRUE" = "green3", "FALSE" = "orchid2" )
  ) + 
  labs(x = "Visit code", y = "Pid") + 
  theme(axis.text.y = element_text(size = 4))

Code
tmp |> 
  filter(!is.na(LBP_dominance), !is.na(nugent_positive)) |> 
  filter(visit_code == "1500") |> 
  dplyr::count(LBP_dominance, nugent_positive) |> 
  group_by(LBP_dominance) |> 
  mutate(
    pct = round(100 * n / sum(n), 1)
  ) |> 
  ungroup() |> 
  gt()
LBP_dominance nugent_positive n pct
FALSE FALSE 25 40.3
FALSE TRUE 37 59.7
TRUE FALSE 23 95.8
TRUE TRUE 1 4.2

LBP detectable at week 12

See “However, an ideal product would durably colonize the majority of people, and given that LBP strains were detectable in only 34% of active arm participants at 12 weeks this product still needs optimization

Code
rel_abs |> 
  filter(visit_code == "2120") |> 
  group_by(.sample) |> 
  summarise(detected = any(!is.na(LBP) & rel_ab > 0, na.rm = TRUE)) |> 
  ungroup() |> 
  dplyr::count(detected) |>
  mutate(
    pct = round(100 * n / sum(n), 1)
  ) |> 
  gt()
detected n pct
FALSE 61 71.8
TRUE 24 28.2